heuristic learning
On Using Admissible Bounds for Learning Forward Search Heuristics
Núñez-Molina, Carlos, Asai, Masataro, Fernández-Olivares, Juan, Mesejo, Pablo
In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of \emph{what} they should learn, \emph{how} to train them, and \emph{why} we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics, with 40% lower MSE on average.
Reconnecting with the Ideal Tree: An Alternative to Heuristic Learning in Real-Time Search
Rivera, Nicolas (Pontificia Universidad Catolica de Chile) | Illanes, Leon (Pontificia Universidad Catolica de Chile) | Baier, Jorge A. (Pontificia Universidad Catolica de Chile) | Hernandez, Carlos (Universidad Catolica de la Santisima Concepcion)
In this paper, we present a conceptually simple, easy-to-implement real-time search algorithm suitable for a priori partially known environments. Instead of performing a series of searches towards the goal, like most Real-Time Heuristic Search Algorithms do, our algorithm follows the arcs of a tree T rooted in the goal state that is built initially using the heuristic h. When the agent observes that an arc in the tree cannot be traversed in the actual environment, it removes such an arc from T and our algorithm carries out a reconnection search whose objective is to find a path between the current state and any node in T. The reconnection search need not be guided by $h$, since the search objective is not to encounter the goal. Furthermore, h need not be updated. We implemented versions of our algorithm that utilize various blind search algorithms for reconnection. We show experimentally that these implementations significantly outperform state-of-the-art real-time heuristic search algorithms for the task of pathfinding in grids. In grids, our algorithms, which do not incorporate any geometrical knowledge, naturally behaves similarly to a bug algorithm, moving around obstacles, and never returning to areas that have been visited in the past. In addition, we prove theoretical properties of the algorithm.